main paper
Federated LoRA Fine-Tuning for LLMs via Collaborative Alignment
He, Shuaida, Chen, Liwen, Feng, Long
Low-rank adaptation (LoRA) has emerged as a powerful tool for parameter-efficient fine-tuning of large language models (LLMs). This paper studies LoRA under a federated learning setting, enabling collaborative fine-tuning across clients while preserving parameter efficiency. We focus on a highly heterogeneous regime in which clients share only partial structure and a substantial subset may be contaminated. We propose Collaborative Low-rank Alignment and Identifiable Recovery (CLAIR), a contamination-aware framework that relies only on preliminary local estimators. Its formulation applies broadly, from linear regression to neural network and LLM modules, whenever local adaptation can be represented by matrix-valued updates. CLAIR recovers the shared LoRA subspace and detects contaminated clients via a structured low-rank plus block-sparse decomposition. We prove exact recovery of the shared LoRA subspace in the noiseless case, stable recovery under preliminary estimation error, and consistent collaborative-set recovery under mild separation conditions. We further quantify the gain from CLAIR refinement: it reduces off-subspace estimation error through cross-client averaging while preserving client-specific variation within the shared LoRA subspace, thus improves over local fine-tuning whenever this oracle gain outweighs the costs of subspace estimation and benign-client heterogeneity. Empirically, we demonstrate the benefits of CLAIR by fine-tuning a Transformer architecture on a text-copying task. The results show accurate contamination detection and improved benign-client performance compared with local fine-tuning and non-robust federated averaging.
Coreset-Induced Conditional Velocity Flow Matching
Wang, Xiao, She, Zihua, Su, Jianxi
We propose Coreset-Induced Conditional Velocity Flow Matching (CCVFM), a generative model that augments hierarchical rectified flow with a data-informed source distribution. Hierarchical flow matching models the full conditional velocity law in velocity space, but its inner flow is asked to transport isotropic Gaussian noise to a multimodal target velocity distribution from scratch. Our key observation is that this inner source can be replaced by a closed-form surrogate built from a coreset of the target. CCVFM first compresses the target into weighted atoms using an entropic Sinkhorn coreset and lifts them to a Gaussian mixture. The induced conditional velocity law is then a closed-form Gaussian mixture that can be sampled without a learned neural sampler. A lightweight correction flow, trained from this exact surrogate source, then refines the remaining surrogate-to-target residual rather than learning an entire noise-to-data map. We prove that the surrogate transport cost equals the target--surrogate Wasserstein gap under an explicit compression assumption, whereas the noise-source analogue has a dimension-scale lower bound. We further characterize the conditional second moment of the direct surrogate-source training target and show that its source-dependent excess is small when the surrogate conditional law is close to the true conditional velocity law in mean and covariance. Empirically, on MNIST, CIFAR-10, ImageNet-32, and CelebA-HQ, the proposed method reaches competitive few-step generation under matched architectures.
220165f9c7f51163b73c8c7fff578b4e-Supplemental-Conference.pdf
This supplementary provides additional experiments as well as details that are required to reproduce our results. These were not included in the main paper due to space limitations. The supplementary is arranged as follows: Section A: Details on Modelling - Section A.1 Details of Theoretical Modelling - Section A.2 Additional Details on CLEAM Algorithm - Section A.3 Details on Fairness Metric - Section A.4 Details of Significance of the Baseline Errors Section B: Deeper Analysis on Error in Fairness Measurement Section C: Validating Statistical Model for Classifier Output - Section C.1 Validation of Sample-Based Estimate vs Model-Based Estimate - Section C.2 Goodness-of-Fit Test: หpfrom the Real GANs with Our Theoretical Model Section D: Additional Experimental Results - Section D.1 Experimental Results with Standard Deviation - Section D.2 Experimental Setup for Diversity - Section D.3 Measuring Varying Degrees of Bias (Gender and BlackHair) - Section D.4 Measuring Varying Degrees of ...
Training step L0L1LT 1W Preprocessing f(x, v) T
In the following sections, we provide additional details about the network architecture, training, and experiments. The source code and WBC-SPH data set are published at https://github.com/ A.1 Implementation Details We implement our neural network with Tensorflow (https://www.tensorflow.org), They also serve as the basis for the implementation of our antisymmetric CConv (ASCC) layer. Axis for Mirroring As mentioned in the main text, the mirror axis for ASCC layers can be chosen freely while fulfilling the requirements from theory. This provides a degree of freedom for implementation. We decided to use a fixed axis, which in our case corresponds to the spatial y-axis. While the mirroring could potentially be coupled to the spatial content of features, we found that a single, fixed axis for mirroring simplifies the implementation of the ASCCs, and hence is preferable in practice. Additional Modifications In addition to the properties of our algorithm as discussed in Section 2.3 and the ablation study in Section 3, we normalize the input data depending on the given gravitational direction in the model.
TransMatcher: Deep Image Matching Through Transformers for Generalizable Person Re-identification: Appendix
For ease and reliable comparison, we report the average of all Rank-1 and mAP results on all test datasets over several random runs for ablation study and parameter analysis. This is denoted by mAcc. There are three reasons that we use mAcc. It is a unified measure, which is convenient for algorithm comparison. Both Rank-1 and mAP are accuracy measures ranging from 0%-100%, thus averaging them is possible. Besides, if a method's mAcc is 1% higher than another method, on average it means that every single measure on each dataset has been increased by 1%, which is a perceptible achievement.
Supplementary material: Benchmarking Deep Inverse Models over time, and the Neural-Adjoint method
Although the performance over time is the main performance that we want to benchmark, as pointed out by [3] the posterior matching is another metric to measure how good the inverse models are. Below we show the posterior matching score using Maximum Mean Discrepancy (MMD) as a measurement of how close the inferred posterior density is comparing with the ground truth (rejection sampled) distribution. Note that for a real-life problem (D4: meta-material) with higher dimensionality, the rejection sampling becomes intractable. The 3 MMD kernel used was 0.05, 0.2 and 0.9. The code is also available on the repository.
0004d0b59e19461ff126e3a08a814c33-AuthorFeedback.pdf
We sincerely appreciate the reviewers for their careful reading, constructive questions and suggestions. We would very1 much like further exchanges to improve our work, but the following is our best effort within the current limits.2 First, we address questions appeared at least twice. We write P1, P2 for paragraph reference, and Rx for reviewers.3 We discuss two main motivations here: lack of graph loss, and empirical failure4 of distinguishing power.